Vocal Joystick: New Software Program

Group 3

Jeremy Moody, Carrie Chudy

Key Words: vocal parameters, acoustic signal processing, pattern recognition, motion control

Vocal Joystick is a software program that will enable individuals with motor impairments to control objects on a computer screen and ultimately electro-mechanical instruments simply by using vocal parameters. Vocal Joystick’s primary user is anyone with a physical impairment that limits the use of their arms and hands. Vocal Joystick utilizes three main components: acoustic signal processing, pattern recognition, and motion control. The program translates verbal cues into directional movements.

Vocal Joystick was developed by Jeff Bilmes, associate professor of electrical engineering at the University of Washington. He states that current speech recognition software is an attempt to replace the keyboard, but there hadn’t been much work, “to essentially replace the mouse, using your voice.” Bilmes states that there are many people out there who have complete control over their voice but not their arms or hands. He believes there are several reasons why the Vocal Joystick might be a better approach to brain-computer interfaces.

Vocal Joystick utilizes three main components: acoustic signal processing, pattern recognition, and motion control. First, the signal processing module extracts short-term acoustic features, such as energy, autocorrelation coefficients, linear prediction coefficients, and mel frequency cepstral coefficients (MFCC). Signal conditioning and analysis techniques are needed for accurate estimation of these features. Next, these features are piped into the pattern recognition module, where energy smoothing, pitch, and formant tracking, vowel classification and discrete sound recognition take place. Finally, energy, pitch, vowel quality and discrete sound become acoustic parameters to be transformed into direction, speed, and other motion related parameters. The application driver takes the motion control parameters and launches corresponding actions.

Vocal Joystick is ultimately supposed to be language independent, user-friendly, and flexible in its application. The vocalizations should be drawn from a set that minimizes the possibility of repetitive use, strain, and maximizes ease of use. In the world’s languages continuous sounds can be drawn from the three main classes: vocalic (vowel like), pitch (rate of vocal fold vibration), and intensity. In vocalic signals, manipulations of pitch, and manipulations of intensity are found as quasi-independent, but co-existing elements in every spoken language.

Vocal Joystick will impact businesses because it will allow people with motor impairment of the arms and hands the opportunity to use a computer. Now, employers will be able to provide these people with motor impairments with a system to complete their business tasks efficiently and inexpensively. With computers being a huge part of daily life, not just the business world, this software would allow many individuals the opportunity to be successful at jobs that require the use of a computer.

Vocal Joystick is an amazing new emerging technology that will benefit both the public and private sectors. With the only required equipment being a standard microphone, computer with a standard sound card, and the free software, anyone with vocal ability can utilize this effective tool.